-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[codegen] Add max(half, half) support when enable fp16 #3811
Conversation
Hi @ZQPei Please check the CI error
|
Hi @cchung100m |
Hi @ZQPei For your PR, besides |
How to add a unittest? |
Hi, @cchung100m Here is the unit test. def test_cuda_vector_max():
num_thread = 8
target = 'cuda'
def check_vector_max(ctx, n, dtype):
if not tvm.gpu(0).exist or not tvm.module.enabled("cuda"):
print("skip because cuda is not enabled..")
return
if dtype == "float16" and not have_fp16(tvm.gpu(0).compute_version):
print("skip because gpu does not support fp16")
return
A = tvm.placeholder((n,), name='A', dtype=dtype)
B = tvm.placeholder((n,), name='B', dtype=dtype)
C = tvm.compute((n,), lambda i: tvm.max(A[i], B[i]), name='C')
s = tvm.create_schedule(C.op)
bx, tx = s[C].split(C.op.axis[0], factor=num_thread)
s[C].bind(bx, tvm.thread_axis("blockIdx.x"))
s[C].bind(tx, tvm.thread_axis("threadIdx.x"))
fun = tvm.build(s, [A,B,C], "cuda", name="vector_max")
np_a = np.random.uniform(size=n).astype(dtype)
np_b = np.random.uniform(size=n).astype(dtype)
np_c = np.maximum(np_a, np_b)
a = tvm.nd.empty((n,), A.dtype, ctx).copyfrom(np_a)
b = tvm.nd.empty((n,), B.dtype, ctx).copyfrom(np_b)
c = tvm.nd.empty((n,), C.dtype, ctx)
fun(a, b, c)
np.testing.assert_equal(c.asnumpy(), np_c)
ctx = tvm.context(target, 0)
check_vector_max(ctx, 10, "float32")
check_vector_max(ctx, 10, "float16") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -50,6 +50,8 @@ void CodeGenCUDA::AddFunction(LoweredFunc f) { | |||
std::string CodeGenCUDA::Finish() { | |||
if (enable_fp16_) { | |||
decl_stream << "#include <cuda_fp16.h>\n"; | |||
decl_stream << "__device__ half max(const half a, const half b)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know which operators we have to overload as such? "max" is one of them. Do we need others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I only find max
that need to be overloaded.
BTW, I have a question about the checks.
Why this commit cannot be built today? It was successful yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, we saw more failures while trying to run full resnet
I think, we are missing all reduce ops. Will it be possible for you to help with this? (In a separate PR, this one is good to go)
Hi @cchung100m @anijain2305 |
1dc314e
to
09ea9ab
Compare
Fix the follow error when compiled with float16 model. ``` /tmp/tmpz_0pydlm/my_kernel.cu(9890): error: more than one instance of overloaded function "max" matches the argument list: function "max(int, int)" function "max(unsigned int, unsigned int)" function "max(int, unsigned int)" function "max(unsigned int, int)" function "max(long, long)" function "max(unsigned long, unsigned long)" function "max(long, unsigned long)" function "max(unsigned long, long)" function "max(long long, long long)" function "max(unsigned long long, unsigned long long)" function "max(long long, unsigned long long)" function "max(unsigned long long, long long)" function "max(float, float)" argument types are: (half, __half) ``` add max(half, half) support when enable fp16 fix cpplint error. add max(half, half) support when enable fp16 fix cpplint error, replace tab with whitespace add unittest for vector_max add unittest for vector_max add max(half, half) support when enable fp16 add max(half, half) support when enable fp16
@ZQPei please also add the test case to this PR |
Hi @ZQPei Please first post in discuss.tvm.ai and provide more details of what you're doing. Currently, it is not clear where the problem is. |
HI, just a ping to get this in :) |
Fix the following error when compiled with float16 model in cuda.
Please check!